Introduction to Phylogenies and the Comparative Method

Showing some neat features of R!
Author
Affiliation
Published

March 13, 2025

Note

In this hands-one, you will learn basic tools in R for visualizing phylogenies, testing models of character evolution, performing phylogenetic correction of a regression model, and test for the phylogenetic signal of continuous characters. This lab is based in part on one designed by LukeHarmon for a workshop that he and others ran at Ilha Bela, Brazil; the original can be seen here There are many other useful labs in comparative analysis from that workshop that you can peruse at your leisure.

You will need two datasets, that will be provided for you:

  1. A data.frame with species traits – CDR_traits_DIAZ_imputed.csv

  2. A phylogenetic tree – CDR_timeTree_tacted.nex

  3. A community dataset – CDR_biocon_sample.csv

1 Set up your data and your working directory

For this hands-on, you will need to have a set of R packages to do this lab. Install the following packages:

Code
# Package vector names 
packages <- c("tidyverse", "knitr", "ape", 
              "geiger", "caper", "phytools", 
              "picante", "magrittr", "corrplot", "car") 
Function install.packages()

You can use the function install.packages() to install the packages.

If you don’t want to install the packages one by one, you can use the next command.

Code
# Install packages not yet installed 
# get packages already installed
installed_packages <- packages %in% rownames(installed.packages())

# If the packages are installed skip if not install them
if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages], dependencies = TRUE)
}

This command, will, first, check if you already the packages installed, then if a package is not installed in your computer, will install it.

Load installed packages:

Code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(knitr)
library(ape)

Attaching package: 'ape'

The following object is masked from 'package:dplyr':

    where
Code
library(geiger)
Loading required package: phytools
Loading required package: maps

Attaching package: 'maps'

The following object is masked from 'package:purrr':

    map
Code
library(phytools)
library(caper)
Loading required package: MASS

Attaching package: 'MASS'

The following object is masked from 'package:dplyr':

    select

Loading required package: mvtnorm
Code
library(picante)
Loading required package: vegan
Loading required package: permute
Loading required package: lattice
This is vegan 2.6-8

Attaching package: 'vegan'

The following object is masked from 'package:phytools':

    scores

Loading required package: nlme

Attaching package: 'nlme'

The following object is masked from 'package:dplyr':

    collapse
Code
library(magrittr)

Attaching package: 'magrittr'

The following object is masked from 'package:purrr':

    set_names

The following object is masked from 'package:tidyr':

    extract
Code
# Aux function for visualization
theme_CDR <- function() {
  theme_bw() + #base_family = "Noto Sans") +
    theme(panel.grid.minor = element_blank(),
          plot.background = element_rect(fill = "white", color = NA),
          strip.background = element_rect(fill = "grey80", color = NA),
          legend.title = element_text(face = "bold", size = 15), 
          legend.text = element_text(size = 12))
}

Set up a working directory. Tell R that this is the directory you will be using, and read in your data:

Function getwd()

You can use the function getwd() to get the current working directory.

Code
setwd("path/for/your/directory")
Function dir.create()

You can use the function dir.create() to get create a series of folders within your working directory. For example, if you run dir.create(“Output”) it will create an empty folder–named Output–within your working directory. This folder then can be used to store the results from the lab.

Load the data. Instead of reading files from the computer we will pull the required data directly from the internet.

Code
## Trait data
traitData <- read_csv("https://raw.githubusercontent.com/jesusNPL/Workshop_PCM_HandsOn/refs/heads/main/Data/CDR_traits_DIAZ_imputed.csv") %>% 
  mutate(species = tipLabel) %>% 
  column_to_rownames("tipLabel")  # we are using the column "tipLabel" as rownames
Rows: 582 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (19): kingdom, phylum, majorGroup, class, order, family_PHY, family_WFO,...
dbl (10): IN_Phylogeny, leaf_area, Nmass, LMA, plant_height, diaspore_mass, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
## Phylogenetic data
phyData <- force.ultrametric(read.nexus("https://raw.githubusercontent.com/jesusNPL/Workshop_PCM_HandsOn/refs/heads/main/Data/CDR_timeTree_tacted.nex"))
***************************************************************
*                          Note:                              *
*    force.ultrametric does not include a formal method to    *
*    ultrametricize a tree & should only be used to coerce    *
*   a phylogeny that fails is.ultrametric due to rounding --  *
*    not as a substitute for formal rate-smoothing methods.   *
***************************************************************
Code
is.ultrametric(phy = phyData)
[1] TRUE
The pipe (%>%) operator

This operator is, maybe, the most used operator from the {dplyr} package and is used to perform a sequence of operations on a data frame. In other words, the pipe operator simply feeds the results of one operation into the next operation below it.

Previous lines of code will only work if you have set your working directory (WD) and only if you have the folder Data within the WD. You can check the Intro-R lab for more details.

OK. You should be ready to go.

Let’s inspect the data first, to do that we will use the function “glimpse()” of the R package {dplyr}

Code
glimpse(traitData)
Rows: 582
Columns: 29
$ kingdom              <chr> "Plantae", "Plantae", "Plantae", "Plantae", "Plan…
$ phylum               <chr> "Tracheophyta", "Tracheophyta", "Tracheophyta", "…
$ majorGroup           <chr> "G", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
$ class                <chr> "Ginkgoopsida", "Magnoliopsida", "Magnoliopsida",…
$ order                <chr> "Ginkgoales", "Asterales", "Fabales", "Gentianale…
$ family_PHY           <chr> "Ginkgoaceae", "Asteraceae", "Fabaceae", "Apocyna…
$ family_WFO           <chr> "Ginkgoaceae", "Asteraceae", "Fabaceae", "Apocyna…
$ genus_PHY            <chr> "Ginkgo", "Achillea", "Amphicarpaea", "Apocynum",…
$ genus_WFO            <chr> "Ginkgo", "Achillea", "Amphicarpaea", "Apocynum",…
$ specificEpithet      <chr> "biloba", "millefolium", "bracteata", "androsaemi…
$ sciname_IN_WFO       <chr> "Ginkgo biloba", "Achillea millefolium", "Amphica…
$ sciname_IN_Phylogeny <chr> "Ginkgo biloba", "Achillea millefolium", "Amphica…
$ IN_Phylogeny         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ Imputed              <chr> "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO", "…
$ sciname_IN_DIAZ      <chr> "Ginkgo biloba", "Achillea aspleniifolia", "Amphi…
$ sciname_IN_BIEN      <chr> "Ginkgo biloba", "Achillea millefolium", NA, "Apo…
$ Woodiness            <chr> "woody", "non-woody", "non-woody", "non-woody", "…
$ growth_form          <chr> "tree", "herbaceous non-graminoid", "climber", "h…
$ leaf_type            <chr> "broadleaved", "broadleaved", "broadleaved", "bro…
$ leaf_area            <dbl> 1336.0000, 2701.6550, 1459.3500, 7156.8200, 2598.…
$ Nmass                <dbl> 20.30000, 16.97002, 30.41157, 17.25568, 17.24000,…
$ LMA                  <dbl> 92.28989, 71.61548, 26.57855, 51.59986, 97.94319,…
$ plant_height         <dbl> 32.8258460, 1.1378405, 0.8581601, 0.7358208, 0.95…
$ diaspore_mass        <dbl> 1083.930700, 0.100000, 30.792000, 0.228900, 0.951…
$ SSD_observed         <dbl> 0.4500000, 0.2966251, 0.2515797, 0.3535481, 0.285…
$ LDMC                 <dbl> 0.3493534, 0.2923324, 0.2616435, 0.3345541, 0.281…
$ SSD_imputed          <dbl> 0.3121684, 0.2493996, 0.2348250, 0.2711764, 0.245…
$ SSD_combined         <dbl> 0.4500000, 0.2530428, 0.2436388, 0.2711764, 0.249…
$ species              <chr> "Ginkgo_biloba", "Achillea_millefolium", "Amphica…

Let’s check if our trait data contain the same species as the phylogeny

Code
tmp <- name.check(phy = phyData, data = traitData)

# print the results
tmp
$tree_not_data
 [1] "Andropogon_gerardii"         "Anemone_patens"             
 [3] "Arabis_divaricarpa"          "Capsellabursa-pastoris"     
 [5] "Colladonia_sp."              "Demia_testudo"              
 [7] "Desmodium_glutinosum"        "Echinacea_serotina"         
 [9] "Eupatorium_maculatum"        "Festuca_guestphalica"       
[11] "Galium_asperellum"           "Galium_trifolium"           
[13] "Gnaphalium_purpureum"        "Helianthemum_bicknellii"    
[15] "Lithospermum_caroliniense"   "Miscanthus_giganteus"       
[17] "Panicum_perlongum"           "Parthenocissus_quinquefolia"
[19] "Petalostemon_candidus"       "Petalostemon_purpureus"     
[21] "Pilosella_longipila"         "Saxifraga_pensylvanica"     
[23] "Symphyotrichumnovae-angliae" "Trientalis_borealis"        

$data_not_tree
[1] "Capsella_bursa-pastoris"      "Deamia_testudo"              
[3] "Galium_asprellum"             "Galium_bifolium"             
[5] "Ginkgo_biloba"                "Heptaptera_sp."              
[7] "Miscanthus_longiberbis"       "Symphyotrichum_novae-angliae"

It indicates that several species are not present either in the trait data or phylogeny, so let’s drop this species from the phylogeny and remove from the trait data. To do that we will use the function drop.tip() of the package {ape}

Code
phyData <- drop.tip(phy = phyData, tip = tmp$tree_not_data)

traitData <- traitData %>% 
  filter(species %in% phyData$tip.label)

We can double check if our data match after dropping the missing species

Code
name.check(phy = phyData, data = traitData)
[1] "OK"

Another option is to use the function “comparative.data” of the R package {caper}

Code
## Create comparative.data object for further analyses
compCDR <- comparative.data(
  phy = phyData,
  data = traitData,
  names.col = "species",
  vcv.dim = TRUE,
  warn.dropped = FALSE
)

Anyway, now it seems that we are ready to go!

2 Working with trees

Let’s start by looking at the phylogeny of these birds and learning a bit about how to work with trees in R.

What does your tree look like?

Code
plot(phyData)

Answer: Whoa. That’s ugly. Let’s clean it up.

Code
plot.phylo(phyData, 
           no.margin = TRUE, 
           cex = 0.5)

Better. You can mess around with tree plotting functions in plot.phylo() as much as you’d like. Try this for example:

Code
plot.phylo(phyData, 
           type = "fan", 
           no.margin = TRUE, 
           cex = 0.3)

Much much better.

It may be useful to understand how trees are encoded in R. Typing in just the name of the tree file like this:

Code
phyData

Phylogenetic tree with 574 tips and 573 internal nodes.

Tip labels:
  Juniperus_communis, Juniperus_sp., Juniperus_virginiana, Picea_sp., Larix_laricina, Pinus_strobus, ...

Rooted; includes branch lengths.

will give you basic information about the phylogeny: the number of tips and nodes; what the tips are called; whether the tree is rooted; and if it has branch lengths.

Code
str(phyData)
List of 4
 $ edge       : int [1:1146, 1:2] 575 576 577 577 578 578 576 579 580 580 ...
 $ edge.length: num [1:1146] 1.46e+02 1.22e+02 5.04 6.54e-03 5.03 ...
 $ Nnode      : int 573
 $ tip.label  : chr [1:574] "Juniperus_communis" "Juniperus_sp." "Juniperus_virginiana" "Picea_sp." ...
 - attr(*, "class")= chr "phylo"
 - attr(*, "order")= chr "cladewise"
 - attr(*, "RSS")= num 2.77e-07

will tell you more about tree structure. Trees consist of tips connected by edges (AKA branches)

Code
phyData$tip.label

gives you a list of all your terminal taxa, which are by default numbered 1-n, where n is the number of taxa.

Code
phyData$Nnode

gives you the number of nodes. This is a fully bifurcating rooted tree, so it has 1 fewer node than the number of taxa.

Code
phyData$edge

This tells you the beginning and ending node for all edges.

Put that all together with the following lines

Code
plot.phylo(phyData, 
           type = "fan", 
           no.margin = TRUE, 
           cex = 0.7, 
           label.offset = 0.1, 
           show.tip.label = FALSE)

nodelabels(cex = 0.5)

tiplabels(cex = 0.5)

There are many ways to manipulate trees in R using {ape}, {Phytools}, and other packages. This just gives you a bare-bones introduction.

3 Working with a data matrix and testing hypotheses in a phylogenetically informed way

Let’s ask some questions using the trait data that were measured for these birds. First, explore the data in the “traitData” matrix. Here are some options for visualizing data matrices:

Code
traitData %>% 
  head() # this will show you the first few rows of your data matrix and its header

traitData %>% 
  dimnames() # this will show you the row and column headers for your matrix

traitData %>% 
  View() # this will let you visualize the entire matrix

leaf area is one of your variables found in the trait dataset. Let’s isolate it so we can work with it easily:

Code
leaf_area <- traitData[, "leaf_area"] 
names(leaf_area) <- rownames(traitData) 
# data vectors have to be labelled with tip names for the associated tree. 
# This is how to do that. 
Exploring the data

It is good practice to check the distribution of your data before doing downstream analysis.

Code
hist(leaf_area)

What about if we log scale the HWI?

Code
hist(log(leaf_area))

Does it look different/similar?

In the lecture, we talked about one model of character evolution, called a Brownian Motion model. This model assumes that a trait evolves from a starting state (z0) according to a random walk with the variance specified by the rate parameter \(\sigma^{2}\) (sigma-squared). In short, Brownian motion describes a process in which tip states are modeled under the assumption of a multivariate normal distribution. On a phylogeny, the multivariate mean of tip states is equal to the root state estimate, and variance accumulates linearly through time.

What does Brownian Motion evolution of leaf area look like?

Code
brownianModel <- fitContinuous(phy = phyData, 
                               dat = log(leaf_area))

brownianModel # this will show you the fit statistics and parameter values
GEIGER-fitted comparative model of continuous data
 fitted 'BM' model parameters:
    sigsq = 35.449783
    z0 = 6.127122

 model summary:
    log-likelihood = -2534.051403
    AIC = 5072.102805
    AICc = 5072.123821
    free parameters = 2

Convergence diagnostics:
    optimization iterations = 100
    failed iterations = 0
    number of iterations with same best fit = 100
    frequency of best fit = 1.000

 object summary:
    'lik' -- likelihood function
    'bnd' -- bounds for likelihood search
    'res' -- optimization iteration summary
    'opt' -- maximum likelihood parameter estimates

Answer: The estimated ancestral state (z0) under Brownian Motion evolutionary model suggests that leaf area was ~6.127122. This value is similar to the current mean trait log-value (\(\mu\) = 6.87). This model also suggests that leaf area is evolving at a rate \(\sigma^{2}\) = 35.45.

Code
mean(log(leaf_area))
[1] 6.868803

Here, you can see the estimates for ancestral state (z0), and the rate parameter (\(\sigma^{2}\)), as well as some measures of model fit. The fit of the model is determined using maximum likelihood, and expressed as a log likelihood (lnL). The higher the lnL, the more probable the data given the model. However, when comparing different models, we can’t use the lnL, because it does not account for the difference in the number of parameters among models. Models with more parameters will always fit better, but do they fit significantly better? For example an OU model has 4 parameters (alpha [\(\alpha\)], theta [\(\theta\)], z0, and sigma-squared [\(\sigma^{2}\)]), so it should fit better than a BM model, which includes only z0 and \(\sigma^{2}\). To account for this, statisticians have developed another measure of fit called the AIC (Akaike Information Criterion): AIC = (2xN)-2xlnL, where N is the number of parameters. This penalizes the likelihood score for adding parameters. When selecting among a set of models, the one with the lowest AIC is preferred. We will use this information later on in this lab.

In addition to assessing model fit, we can use the Brownian Motion model to reconstruct ancestral states of a character on a tree. To visualize what BM evolution of this trait looks like on a tree. The contMap() command in {phytools} estimates the ancestral states and plots them on a tree.

Code
## Calculate number of trait shifts
obj <- contMap(phyData, 
        log(leaf_area), 
        fsize = 0.1, 
        lwd = 2, 
        type = "fan", 
        plot = FALSE)

# change colors
obj <- setMap(obj, 
              c("white", "#FFFFB2", "#FECC5C", "#FD8D3C", "#E31A1C")) 

# Plot the results
plot(obj, 
     fsize = c(0.2, 0.8), 
     leg.txt = "Leaf area")

Describe the evolution of Hand-wing index on this tree. How many times have extremely high and extremely low Hand-wing index evolved on this tree?

Answer: This phylogenetic tree contains 249 species, and the mapped values range between 1.96 and 3.57. By visual inspection, overall, the HWI tended to evolve more low values than high values; indeed, HWI seems to evolve high values in about 5-7 branches. The species with higher HWI index are Berlepschia rikeri, Geositta antarctica, Geositta isabellina, Geositta saxicolina, and Geositta maritima.

Code
plot(obj, 
      fsize = c(0.6, 0.8), 
      type = "fan", 
      leg.txt = "Hand-Wing Index")

What does this say about our ability to test hypotheses about the evolution of leaf area?

Answer: Although the Brownian Motion model is simple, it helps us understand how fast or slow a specific attribute is evolving and to identify branches with different evolutionary regimes.

Let’s go ahead and test some hypotheses. Plant height is another trait in your data matrix. Let’s assess whether there is a correlation between leaf area and plant size?

Code
plant_height <- traitData[, "plant_height"]

names(plant_height) <- rownames(traitData)

Let’s see if range size follow a normal distribution

Code
hist(log(plant_height))

Let’s look at a plot of range size as a function of Hand-wing index.

Code
traitData %>% 
  ggplot(aes(x = log(plant_height), y = log(leaf_area))) + 
  geom_point(alpha = 0.5, color = "darkgray", size = 3) +  
  labs(x = "log(Plant height)", y = "log(Leaf area)") + 
  theme_CDR()

Hm. looks promising.

How would you describe the relationship between these two variables?

Answer: At first glance, it seems that there is a positive association between leaf area and plant height.

Let’s be more quantitative in describing that relationship with a linear model.

Code
lmModel <- lm(log(leaf_area) ~ log(plant_height)) 

summary(lmModel)

Call:
lm(formula = log(leaf_area) ~ log(plant_height))

Residuals:
    Min      1Q  Median      3Q     Max 
-4.7333 -0.3098  0.1543  0.5337  3.0931 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        6.89287    0.04276  161.19   <2e-16 ***
log(plant_height)  0.38352    0.03657   10.49   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.023 on 572 degrees of freedom
Multiple R-squared:  0.1613,    Adjusted R-squared:  0.1598 
F-statistic:   110 on 1 and 572 DF,  p-value: < 2.2e-16
Code
traitData %>% 
  ggplot(aes(x = log(plant_height), y = log(leaf_area))) + 
  geom_point(alpha = 0.5, color = "darkgray", size = 3) +  
  labs(x = "log(Plant height)", y = "log(Leaf area)") + 
  geom_smooth(method = "lm") + 
  theme_CDR()
`geom_smooth()` using formula = 'y ~ x'

The coefficients table from the summary() command shows the slope and intercept for the linear model describing leaf area as a function of plant height. Each line shows the estimated coefficient (Estimate), the standard error (Std. Error) of that estimate, as well as a t-statistic and associated p-value, testing whether those parameters are equal to 0. The Multiple R-squared is an estimate of how much variance in the response variable can be explained by the predictor variable.

Write the linear model for this relationship. Are the parameters significantly different from 0?

Answer: LA(y) ~ \(\alpha\) = 6.89287 + \(\beta\) = 0.38352(log(plant height)) + \(\epsilon\)

The coefficients (\(\alpha\) = Intercept and \(\beta\) = Slope) of this model are different from zero, that is, the model suggest that there is a positive association between the two variables (Leaf area ~ plant height).

What is the R^2 value for this data?

Answer: Despite the coefficients of the model being different from zero, the adjusted R^2 = 0.16 meaning that plant height explains 16% of the variation of leaf area.

How do you feel about that?

3.1 Phylogenetic regression (PGLS)

Nice. But, we have not considered the fact that these plants in Cedar Creek are related to each other. As such, they may share similar trait values simply due to the fact that their ancestors had large leaf area and heigth or the reverse. In other words, we need to account for non-independence of residuals due to phylogeny. One way to do that is to use phylogenetic-generalized-least-squares regression (PGLS).

Code
pglsModel <- pgls(log(leaf_area) ~ log(plant_height), lambda = "ML", data = compCDR)

summary(pglsModel)

Call:
pgls(formula = log(leaf_area) ~ log(plant_height), data = compCDR, 
    lambda = "ML")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.33751 -0.03045  0.01565  0.05670  0.26769 

Branch length transformations:

kappa  [Fix]  : 1.000
lambda [ ML]  : 0.575
   lower bound : 0.000, p = 4.4409e-16
   upper bound : 1.000, p = < 2.22e-16
   95.0% CI   : (0.398, 0.721)
delta  [Fix]  : 1.000

Coefficients:
                  Estimate Std. Error t value  Pr(>|t|)    
(Intercept)       5.332669   0.616152  8.6548 < 2.2e-16 ***
log(plant_height) 0.480629   0.054787  8.7727 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.08438 on 376 degrees of freedom
Multiple R-squared: 0.1699, Adjusted R-squared: 0.1677 
F-statistic: 76.96 on 1 and 376 DF,  p-value: < 2.2e-16 
Code
summary(pglsModel)

Call:
pgls(formula = log(leaf_area) ~ log(plant_height), data = compCDR, 
    lambda = "ML")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.33751 -0.03045  0.01565  0.05670  0.26769 

Branch length transformations:

kappa  [Fix]  : 1.000
lambda [ ML]  : 0.575
   lower bound : 0.000, p = 4.4409e-16
   upper bound : 1.000, p = < 2.22e-16
   95.0% CI   : (0.398, 0.721)
delta  [Fix]  : 1.000

Coefficients:
                  Estimate Std. Error t value  Pr(>|t|)    
(Intercept)       5.332669   0.616152  8.6548 < 2.2e-16 ***
log(plant_height) 0.480629   0.054787  8.7727 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.08438 on 376 degrees of freedom
Multiple R-squared: 0.1699, Adjusted R-squared: 0.1677 
F-statistic: 76.96 on 1 and 376 DF,  p-value: < 2.2e-16 
Code
coef(pglsModel)
      (Intercept) log(plant_height) 
        5.3326689         0.4806286 
Code
traitData %>% 
  ggplot(aes(x = log(plant_height), y = log(leaf_area))) + 
  geom_point(alpha = 0.5, color = "darkgray", size = 3) +  
  labs(x = "log(Plant height)", y = "log(Leaf area)") + 
  geom_abline(intercept = coef(pglsModel)[1], slope = coef(pglsModel)[2], 
              color = "red", linewidth = 1.5) + 
  theme_CDR()

Code
# will plot the pgls regression line on your biplot.
Code
## Loglik of the PGLS model
logLik(pglsModel)
'log Lik.' -554.4899 (df=2)
Code
## Estimate null model or intercept-only model
y <- as.numeric(fitted(pglsModel) + resid(pglsModel))

nullModel <- lm(y ~ 1)

## loglik null model
logLik(nullModel)
'log Lik.' -613.6878 (df=2)
Code
## Is higher than a simple null model
logLik(pglsModel) > logLik(nullModel)
     [,1]
[1,] TRUE
Code
## Is higher than the lm model
logLik(pglsModel) > logLik(lmModel)
     [,1]
[1,] TRUE

Let’s use Information Criteria and see which model performed the best.

Code
AIC(lmModel)
[1] 1659.054
Code
AIC(nullModel)
[1] 1231.376
Code
AIC(pglsModel)
[1] 1112.98

Or we can make our model comparisons more interpretable.

Code
mods <- c(AIC(lmModel), AIC(nullModel), AIC(pglsModel))
names(mods) <- c("OLS", "Null", "PGLS")

aics <- aicw(mods) 
names(aics)[1] <- "AIC"

aics$lnL <- c(logLik(lmModel), logLik(nullModel), logLik(pglsModel))

kable(aics)
AIC delta w lnL
OLS 1659.054 546.0736 0 -826.5268
Null 1231.376 118.3957 0 -613.6878
PGLS 1112.980 0.0000 1 -554.4899

Seems that PGLS is outperforming the OLS, let’s plot the results:

Code
traitData %>% 
  ggplot(aes(x = log(plant_height), y = log(leaf_area))) + 
  geom_point(alpha = 0.5, color = "darkgray", size = 3) +  
  labs(x = "log(Plant height)", y = "log(Leaf area)") + 
    geom_smooth(method = "lm", se = TRUE, linewidth = 1.5) + # OLS slope
  geom_abline(intercept = coef(pglsModel)[1], slope = coef(pglsModel)[2], # PGLS slope
              color = "red", linewidth = 1.5) + 
  theme_CDR()
`geom_smooth()` using formula = 'y ~ x'

Code
# will plot the pgls regression line on your biplot.

How do you feel about that?

4 Phylogenetic signal

Phylogenetic signal is the tendency of related species to resemble each other more than species drawn at random from the same tree.

4.1 Blomberg’s K

Blomberg’s K compares the variance of PICs to what we would expect under a Brownian motion (BM) model of evolution. K = 1 means that close relatives resemble each other as much as we should expect under BM. K < 1 that there is less phylogenetic signal than expected under BM and K > 1 means that there is more. In addition, a significant p-value returned from a randomization test tells us that the phylogenetic signal is significant, in other words, close relatives are more similar than random pairs of taxa in the dataset.

Code
# Run Blomberg's K
K_PH <- phylosig(tree = phyData, # Phylogeny
                  x = log(plant_height), # trait
                  method = "K", # method
                  test = TRUE)

# Print results
print(K_PH)

Phylogenetic signal K : 0.00112205 
P-value (based on 1000 randomizations) : 0.003 
Code
# Plot results
plot(K_PH)

4.2 Pagel’s Lambda

Pagel’s \(\lambda\) is a tree transformation that stretches the tip branches relative to internal branches, making the tree more and more like a complete polytomy of a star phylogeny. If \(\lambda = 0\) there is no phylogenetic signal, while \(\lambda = 1\) correspond to BM and \(0 < \lambda < 1\) in between.

Code
# Run Pagel's Lambda
LB_PH <- phylosig(tree = phyData, 
                  x = log(plant_height), 
                  method = "lambda", 
                  test = TRUE)

# Print the results
print(LB_PH)

Phylogenetic signal lambda : 0.794384 
logL(lambda) : -741.291 
LR(lambda=0) : 324.333 
P-value (based on LR test) : 1.64816e-72 
Code
# Plot thre results
plot(LB_PH)

Describe the results of phylogenetic signal. Does plant height presents phylogenetic signal?

5 Model Fitting

Brownian Motion is only one model of evolution for a continuous variable. This model assumes that a trait evolves from a starting state (z0) according to a random walk with the variance specified by the rate parameter \(\sigma^{2}\) (sigma-squared). In short, Brownian motion describes a process in which tip states are modeled under the assumption of a multivariate normal distribution. On a phylogeny, the multivariate mean of tip states is equal to the root state estimate, and variance accumulates linearly through time.

Another model is the Ornstein-Uhlenbeck (OU) model, which allows the trait mean to evolve towards a new state (theta), with a selective force (alpha). These two new parameters, plus the starting state (z0) and the rate of evolution (sigsq) parameters from the BM model, make for a 4-parameter model.

The Early Burst (EB) model allows the rate of evolution to change across the tree, where the early rate of evolution is high and declines over time (presumably as niches are filled during an adaptive radiation. The rate of evolution changes exponentially over time and is specified under the model r[t] = r[0] x exp(a x t), where r[0] is the initial rate, a is the rate change parameter, and t is time. The maximum bound is set to -0.000001, representing a decelerating rate of evolution. The minimum bound is set to \(log(10^{-5})\)/depth of the tree.

Let’s evaluate the relative fit of these three models to the Hand-wing index trait.

5.1 Brownian Motion (BM)

Code
BMModel <- fitContinuous(phy = phyData, # phylogeny 
                         dat = log(leaf_area), # trait 
                         model = "BM") # evolutionary model

BMModel
GEIGER-fitted comparative model of continuous data
 fitted 'BM' model parameters:
    sigsq = 35.449783
    z0 = 6.127122

 model summary:
    log-likelihood = -2534.051403
    AIC = 5072.102805
    AICc = 5072.123821
    free parameters = 2

Convergence diagnostics:
    optimization iterations = 100
    failed iterations = 0
    number of iterations with same best fit = 100
    frequency of best fit = 1.000

 object summary:
    'lik' -- likelihood function
    'bnd' -- bounds for likelihood search
    'res' -- optimization iteration summary
    'opt' -- maximum likelihood parameter estimates

5.2 Ornstein-Uhlenbeck (OU)

Code
OUModel <- fitContinuous(phy = phyData, # phylogeny 
                         dat = log(leaf_area), # trait 
                         model = "OU", # evolutionary model
                         control = list(method = c("subplex")))

OUModel
GEIGER-fitted comparative model of continuous data
 fitted 'OU' model parameters:
    alpha = 2.718282
    sigsq = 41.672016
    z0 = 6.868176

 model summary:
    log-likelihood = -1359.134310
    AIC = 2724.268621
    AICc = 2724.310726
    free parameters = 3

Convergence diagnostics:
    optimization iterations = 100
    failed iterations = 0
    number of iterations with same best fit = 3
    frequency of best fit = 0.030

 object summary:
    'lik' -- likelihood function
    'bnd' -- bounds for likelihood search
    'res' -- optimization iteration summary
    'opt' -- maximum likelihood parameter estimates

5.3 Early Burst (EB)

Code
EBModel <- fitContinuous(phy = phyData, # phylogeny
                         dat = log(leaf_area), # trait 
                         model = "EB")

EBModel
GEIGER-fitted comparative model of continuous data
 fitted 'EB' model parameters:
    a = -0.000001
    sigsq = 35.458753
    z0 = 6.127103

 model summary:
    log-likelihood = -2534.055801
    AIC = 5074.111601
    AICc = 5074.153706
    free parameters = 3

Convergence diagnostics:
    optimization iterations = 100
    failed iterations = 0
    number of iterations with same best fit = 1
    frequency of best fit = 0.010

 object summary:
    'lik' -- likelihood function
    'bnd' -- bounds for likelihood search
    'res' -- optimization iteration summary
    'opt' -- maximum likelihood parameter estimates

And recover the parameter values and fit estimates.

Code
BMModel

OUModel

EBModel

Compare all models and select the best fitting model. To to that, we will use AIC model comporison approach based on weights.

Code
# Vector of models
EvoMods <- c(BMModel$opt$aicc, OUModel$opt$aicc, EBModel$opt$aicc)
# rename the models
names(EvoMods) <- c("BM", "OU", "EB")

# Run AIC weights 
EvoModsComparison <- aicw(EvoMods)

names(EvoModsComparison)[1] <- "AIC"

EvoModsComparison$lnL <- c(brownianModel$opt$lnL, OUModel$opt$lnL, EBModel$opt$lnL)

kable(EvoModsComparison)
AIC delta w lnL
BM 5072.124 2347.813 0 -2534.051
OU 2724.311 0.000 1 -1359.134
EB 5074.154 2349.843 0 -2534.056

Make a table with the AIC and lnL values for each model. Which model provides the best fit for leaf area?

Answer: According to the model comparison using AIC, the model that best fits the data is the OU model.

Now, add the results for a model fitting analysis of plant size to this table.

Code
BM_PH <- fitContinuous(phy = phyData, # phylogeny
                         dat = log(plant_height), # trait 
                         model = "BM")

OU_PH <- fitContinuous(phy = phyData, # phylogeny
                         dat = log(plant_height), # trait 
                         model = "OU")
Warning in fitContinuous(phy = phyData, dat = log(plant_height), model = "OU"): 
Parameter estimates appear at bounds:
    alpha
Code
EB_PH <- fitContinuous(phy = phyData, # phylogeny
                         dat = log(plant_height), # trait 
                         model = "EB")
Warning in fitContinuous(phy = phyData, dat = log(plant_height), model = "EB"): 
Parameter estimates appear at bounds:
    a
Code
# Vector of models
mods_PH <- c(BM_PH$opt$aicc, OU_PH$opt$aicc, EB_PH$opt$aicc)
# rename the models
names(mods_PH) <- c("BM", "OU", "EB")

# Run AIC weights 
aic_PH <- aicw(mods_PH) 
names(aic_PH)[1] <- "AIC"

aic_PH$lnL <- c(BM_PH$opt$lnL, OU_PH$opt$lnL, EB_PH$opt$lnL)

kable(aic_PH)
AIC delta w lnL
BM 4552.629 2211.921 0 -2274.304
OU 2340.709 0.000 1 -1167.333
EB 4554.659 2213.951 0 -2274.309

One last thing that I would like to present is the estimation of the metric phylogenetic half-life that can be interpreted as the time it takes for half of the information in the phylogeny to be erased.

Code
# half-life leaf area
log(2) / OUModel$opt$alpha 
[1] 0.2549946

6 Measuring phylogenetic diversity within communities

Note

The main goal of this section is to present basic understanding about measuring phylogenetic diversity within communities or best known as the analysis of community phylogenetics. The community phylogenetics integrates ecological and evolutionary concepts and explores the mechanisms (e.g., biotic interactions or environmental filters) governing the assembly of ecological communities.

There are different sources of information and web pages with a lot of information about this field. The most common and useful are the web pages of the books: Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology and Phylogenies in Ecology. Among the most influential papers in this field are Phylogenies and Community Ecology by Webb et al. (2002) and The merging of community ecology and phylogenetic biology by Cavender-Bares et al. (2009).

Load the phylogenetic tree and community data.

Code
## Phylogenetic data
phyData <- force.ultrametric(read.nexus("https://raw.githubusercontent.com/jesusNPL/Workshop_PCM_HandsOn/refs/heads/main/Data/CDR_timeTree_tacted.nex"))
***************************************************************
*                          Note:                              *
*    force.ultrametric does not include a formal method to    *
*    ultrametricize a tree & should only be used to coerce    *
*   a phylogeny that fails is.ultrametric due to rounding --  *
*    not as a substitute for formal rate-smoothing methods.   *
***************************************************************
Code
## Community data
cdmData <- read_csv("https://raw.githubusercontent.com/jesusNPL/Workshop_PCM_HandsOn/refs/heads/main/Data/CDR_biocon_sample.csv")
Rows: 144 Columns: 21
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): ctrt, ntrt
dbl (19): LTERYear, plot, ring, Achillea_millefolium, Elymus_repens, Amorpha...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let’s arrange the BioCON sample data to create an object that allow us to calculate phylogenetic diversity metrics.

Code
glimpse(cdmData)
Rows: 144
Columns: 21
$ LTERYear                <dbl> 1998, 1998, 1998, 1998, 1998, 1998, 1998, 1998…
$ plot                    <dbl> 13, 23, 30, 33, 43, 45, 51, 52, 64, 67, 69, 10…
$ ring                    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2…
$ ctrt                    <chr> "Cenrich", "Cenrich", "Cenrich", "Cenrich", "C…
$ ntrt                    <chr> "Nenrich", "Namb", "Nenrich", "Namb", "Nenrich…
$ Achillea_millefolium    <dbl> 59.750, 20.000, 49.190, 42.945, 50.000, 52.560…
$ Elymus_repens           <dbl> 5.00, 3.50, 2.00, 9.00, 0.80, 0.75, 0.60, 1.25…
$ Amorpha_canescens       <dbl> 0.000, 0.150, 0.050, 0.300, 0.020, 0.050, 0.25…
$ Andropogon_gerardi      <dbl> 0.00, 0.02, 0.00, 0.00, 0.70, 0.00, 0.00, 0.00…
$ Anemone_cylindrica      <dbl> 0.000, 0.000, 0.000, 0.000, 0.000, 0.020, 0.00…
$ Asclepias_tuberosa      <dbl> 0.000, 0.550, 0.050, 1.200, 0.300, 0.000, 0.30…
$ Bouteloua_gracilis      <dbl> 4.000, 11.475, 5.000, 11.000, 0.000, 0.000, 0.…
$ Bromus_inermis          <dbl> 2.00, 19.50, 10.50, 7.00, 14.25, 5.50, 4.00, 6…
$ Koeleria_micrathera     <dbl> 3.00, 0.00, 0.55, 3.50, 0.00, 0.00, 0.50, 1.50…
$ Lespedeza_capitata      <dbl> 0.900, 4.500, 1.250, 2.250, 1.100, 1.250, 0.90…
$ Lupinus_perennis        <dbl> 3.500, 20.000, 9.000, 14.000, 2.500, 18.500, 3…
$ Dalea_villosa           <dbl> 0.00, 0.00, 0.00, 0.01, 0.00, 0.00, 0.00, 0.01…
$ Poa_pratensis           <dbl> 20.500, 12.500, 20.940, 7.000, 27.915, 17.000,…
$ Schizachyrium_scoparium <dbl> 0.500, 3.000, 0.020, 0.000, 0.000, 0.000, 0.00…
$ Solidago_rigida         <dbl> 0.000, 0.000, 1.000, 0.100, 0.050, 0.250, 0.30…
$ Sorghastrum_nutans      <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.50, 0.00, 0.00…
Code
cdmData <- cdmData %>% 
  mutate(bioconID = paste0(ctrt, "_", ntrt, "_", plot, "_", LTERYear)) %>% 
  dplyr::select(!c(ctrt, ntrt, plot, ring, LTERYear)) %>% 
  dplyr::select(bioconID, everything()) %>% 
  column_to_rownames("bioconID")
  
cdrData <- match.phylo.comm(phy = phyData, 
                            comm = cdmData)
[1] "Dropping tips from the tree because they are not present in the community data:"
  [1] "Juniperus_communis"            "Juniperus_sp."                
  [3] "Juniperus_virginiana"          "Picea_sp."                    
  [5] "Larix_laricina"                "Pinus_strobus"                
  [7] "Pinus_resinosa"                "Pinus_sp."                    
  [9] "Pinus_banksiana"               "Lemna_sp."                    
 [11] "Sagittaria_latifolia"          "Uvularia_sessilifolia"        
 [13] "Streptopus_lanceolatus"        "Smilax_herbacea"              
 [15] "Hypoxis_hirsuta"               "Platanthera_lacera"           
 [17] "Iris_versicolor"               "Elymus_trachycaulus"          
 [19] "Allium_stellatum"              "Allium_sp."                   
 [21] "Allium_cernuum"                "Sisyrinchium_campestre"       
 [23] "Asparagus_officinalis"         "Maianthemum_stellatum"        
 [25] "Maianthemum_canadense"         "Polygonatum_sp."              
 [27] "Polygonatum_biflorum"          "Tradescantia_sp."             
 [29] "Tradescantia_bracteata"        "Tradescantia_occidentalis"    
 [31] "Typha_sp."                     "Typha_latifolia"              
 [33] "Typha_angustifolia"            "Juncus_sp."                   
 [35] "Juncus_tenuis"                 "Hierochloe_odorata"           
 [37] "Anthoxanthum_nitens"           "Scleria_triglomerata"         
 [39] "Eleocharis_sp."                "Eleocharis_geniculata"        
 [41] "Cyperus_filiculmis"            "Cyperus_schweinitzii"         
 [43] "Cyperus_sp."                   "Cyperus_strigosus"            
 [45] "Dulichium_arundinaceum"        "Scirpus_cyperinus"            
 [47] "Carex_disperma"                "Carex_echinata"               
 [49] "Carex_interior"                "Carex_canescens"              
 [51] "Carex_vulpinoidea"             "Carex_foenea"                 
 [53] "Carex_diandra"                 "Carex_bicknellii"             
 [55] "Carex_lativena"                "Carex_bebbii"                 
 [57] "Carex_scoparia"                "Carex_sp."                    
 [59] "Carex_chordorrhiza"            "Carex_haydenii"               
 [61] "Carex_stricta"                 "Carex_pensylvanica"           
 [63] "Carex_gracillima"              "Carex_oligosperma"            
 [65] "Carex_lasiocarpa"              "Carex_rostrata"               
 [67] "Carex_comosa"                  "Carex_lacustris"              
 [69] "Chamaenerion_angustifolium"    "Cenchrus_americanus"          
 [71] "Pilosella_aurantiaca"          "Frangula_alnus"               
 [73] "Demia_testudo"                 "Bouteloua_dactyloides"        
 [75] "Leersia_oryzoides"             "Glyceria_sp."                 
 [77] "Glyceria_grandis"              "Miscanthus_giganteus"         
 [79] "Schizachne_purpurascens"       "Bromus_tectorum"              
 [81] "Bromus_ciliatus"               "Bromus_kalmii"                
 [83] "Bromus_sp."                    "Lysimachia_borealis"          
 [85] "Secale_cereale"                "Secale_vavilovii"             
 [87] "Elymus_canadensis"             "Andropogon_gerardii"          
 [89] "Phalaris_arundinacea"          "Calamagrostis_canadensis"     
 [91] "Agrostis_sp."                  "Agrostis_scabra"              
 [93] "Phleum_pratense"               "Festuca_rubra"                
 [95] "Festuca_guestphalica"          "Festuca_ovina"                
 [97] "Festuca_sp."                   "Festuca_guestfalica"          
 [99] "Lolium_multiflorum"            "Poa_sp."                      
[101] "Poa_compressa"                 "Poa_nemoralis"                
[103] "Poa_palustris"                 "Hesperostipa_spartea"         
[105] "Hesperostipa_comata"           "Calamagrostis_stricta"        
[107] "Calamagrostis_inexpansa"       "Koeleria_pyramidata"          
[109] "Danthonia_spicata"             "Aristida_sp."                 
[111] "Aristida_tuberculosa"          "Aristida_basiramea"           
[113] "Eragrostis_trichodes"          "Eragrostis_spectabilis"       
[115] "Cycloloma_atriplicifolium"     "Dysphania_atriplicifolia"     
[117] "Muhlenbergia_racemosa"         "Bouteloua_aristidoides"       
[119] "Bouteloua_curtipendula"        "Bouteloua_hirsuta"            
[121] "Sporobolus_rigidus"            "Sporobolus_heterolepis"       
[123] "Aster_sp."                     "Sporobolus_cryptandrus"       
[125] "Paspalum_setaceum"             "Setaria_sp."                  
[127] "Setaria_viridis"               "Setaria_italica"              
[129] "Digitaria_sp."                 "Digitaria_sanguinalis"        
[131] "Digitaria_cognata"             "Cenchrus_longispinus"         
[133] "Roosia_sp."                    "Digitaria_ischaemum"          
[135] "Chondrosum_hirsutum"           "Trachypogon_spicatus"         
[137] "Ageratina_liebmannii"          "Calamagrostis_sp."            
[139] "Bouteloua_sp."                 "Panicum_miliaceum"            
[141] "Panicum_capillare"             "Panicum_virgatum"             
[143] "Panicum_sp."                   "Panicum_acuminatum"           
[145] "Dichanthelium_praecocius"      "Panicum_oligosanthes"         
[147] "Panicum_perlongum"             "Dichanthelium_oligosanthes"   
[149] "Dichanthelium_acuminatum"      "Panicum_leibergii"            
[151] "Dichanthelium_linearifolium"   "Caltha_palustris"             
[153] "Anemone_patens"                "Pulsatilla_patens"            
[155] "Anemone_quinquefolia"          "Anemone_virginiana"           
[157] "Delphinium_carolinianum"       "Thalictrum_dioicum"           
[159] "Thalictrum_dasycarpum"         "Ranunculus_rhomboideus"       
[161] "Aquilegia_canadensis"          "Parthenocissus_quinquefolia"  
[163] "Vitis_vulpina"                 "Vitis_riparia"                
[165] "Parthenocissus_inserta"        "Parthenocissus_vitacea"       
[167] "Parthenocissus_sp."            "Ribes_hirtellum"              
[169] "Heuchera_richardsonii"         "Saxifraga_pensylvanica"       
[171] "Micranthes_pensylvanica"       "Penthorum_sedoides"           
[173] "Geranium_maculatum"            "Ludwigia_polycarpa"           
[175] "Circaea_lutetiana"             "Epilobium_sp."                
[177] "Epilobium_leptophyllum"        "Epilobium_ciliatum"           
[179] "Oenothera_parviflora"          "Oenothera_biennis"            
[181] "Oenothera_sp."                 "Oenothera_rhombipetala"       
[183] "Zanthoxylum_americanum"        "Acer_negundo"                 
[185] "Acer_sp."                      "Acer_rubrum"                  
[187] "Toxicodendron_radicans"        "Rhus_sp."                     
[189] "Rhus_glabra"                   "Rhus_typhina"                 
[191] "Lechea_stricta"                "Helianthemum_bicknellii"      
[193] "Crocanthemum_bicknellii"       "Berteroa_incana"              
[195] "Leavenworthia_sp."             "Sisymbrium_altissimum"        
[197] "Barbarea_vulgaris"             "Arabis_divaricarpa"           
[199] "Boechera_divaricarpa"          "Turritis_glabra"              
[201] "Lepidium_densiflorum"          "Lepidium_sp."                 
[203] "Lepidium_virginicum"           "Capsellabursa-pastoris"       
[205] "Arabis_sp."                    "Arabis_hirsuta"               
[207] "Celastrus_scandens"            "Petalostemon_candidus"        
[209] "Oxalis_sp."                    "Oxalis_stricta"               
[211] "Euphorbia_sp."                 "Euphorbia_corollata"          
[213] "Euphorbia_geyeri"              "Euphorbia_glyptosperma"       
[215] "Euphorbia_maculata"            "Viola_selkirkii"              
[217] "Viola_pedatifida"              "Viola_macloskeyi"             
[219] "Viola_palustris"               "Viola_pedata"                 
[221] "Viola_sp."                     "Viola_sagittata"              
[223] "Populus_grandidentata"         "Populus_tremuloides"          
[225] "Populus_deltoides"             "Populus_balsamifera"          
[227] "Salix_pyrifolia"               "Salix_bebbiana"               
[229] "Salix_sp."                     "Salix_pedicellaris"           
[231] "Salix_discolor"                "Salix_interior"               
[233] "Salix_petiolaris"              "Salix_humilis"                
[235] "Salix_serissima"               "Elymus_smithii"               
[237] "Parogonum_ciliinode"           "Ceanothus_americanus"         
[239] "Parietaria_pensylvanica"       "Pilea_pumila"                 
[241] "Urtica_dioica"                 "Ulmus_sp."                    
[243] "Ulmus_pumila"                  "Ulmus_americana"              
[245] "Spiraea_tomentosa"             "Spiraea_alba"                 
[247] "Prunus_virginiana"             "Prunus_serotina"              
[249] "Prunus_americana"              "Prunus_sp."                   
[251] "Prunus_pensylvanica"           "Prunus_pumila"                
[253] "Malus_domestica"               "Amelanchier_arborea"          
[255] "Aronia_melanocarpa"            "Agrimonia_sp."                
[257] "Amelanchier_laevis"            "Amelanchier_interior"         
[259] "Amelanchier_alnifolia"         "Amelanchier_sanguinea"        
[261] "Rubus_idaeus"                  "Rubus_pubescens"              
[263] "Rubus_occidentalis"            "Rubus_flagellaris"            
[265] "Rubus_sp."                     "Rubus_canadensis"             
[267] "Rubus_allegheniensis"          "Rosa_carolina"                
[269] "Rosa_arkansana"                "Rosa_blanda"                  
[271] "Potentilla_sp."                "Potentilla_simplex"           
[273] "Potentilla_norvegica"          "Potentilla_recta"             
[275] "Potentilla_argentea"           "Amelanchier_spicata"          
[277] "Comarum_palustre"              "Amelanchier_sp."              
[279] "Potentilla_arguta"             "Drymocallis_arguta"           
[281] "Fragaria_virginiana"           "Fragaria_sp."                 
[283] "Fragaria_vesca"                "Alnus_incana"                 
[285] "Corylus_americana"             "Betula_papyrifera"            
[287] "Betula_pumila"                 "Betula_alleghaniensis"        
[289] "Quercus_petraea"               "Quercus_ellipsoidalis"        
[291] "Quercus_sp."                   "Quercus_alba"                 
[293] "Quercus_macrocarpa"            "Polygala_sp."                 
[295] "Polygala_polygama"             "Polygala_sanguinea"           
[297] "Chamaecrista_fasciculata"      "Strophostyles_leiosperma"     
[299] "Amphicarpa_bracteata"          "Amphicarpaea_bracteata"       
[301] "Desmodium_glutinosum"          "Hylodesmum_glutinosum"        
[303] "Calamovilfa_longifolia"        "Desmodium_canadense"          
[305] "Dalea_candida"                 "Spartina_pectinata"           
[307] "Dalea_purpurea"                "Lupinus_sp."                  
[309] "Baptisia_alba"                 "Senecio_sp."                  
[311] "Senecio_aureus"                "Robinia_pseudoacacia"         
[313] "Trientalis_borealis"           "Astragalus_canadensis"        
[315] "Buchloe_dactyloides"           "Medicago_lupulina"            
[317] "Medicago_sp."                  "Medicago_sativa"              
[319] "Melilotus_sp."                 "Melilotus_officinalis"        
[321] "Melilotus_albus"               "Lathyrus_venosus"             
[323] "Lathyrus_ochroleucus"          "Vicia_sativa"                 
[325] "Vicia_setifolia"               "Vicia_villosa"                
[327] "Trifolium_hybridum"            "Trifolium_repens"             
[329] "Trifolium_pratense"            "Trifolium_sp."                
[331] "Trifolium_arvense"             "Comandra_umbellata"           
[333] "Pennisetum_glaucum"            "Maianthemum_racemosum"        
[335] "Hypericum_majus"               "Triadenum_fraseri"            
[337] "Salsola_kali"                  "Amaranthus_retroflexus"       
[339] "Sporobolus_michauxianus"       "Chenopodium_sp."              
[341] "Chenopodium_leptophyllum"      "Chenopodium_album"            
[343] "Chenopodium_hybridum"          "Chenopodiastrum_hybridum"     
[345] "Campanula_aparinoides"         "Campanula_sp."                
[347] "Palustricodon_aparinoides"     "Silene_latifolia"             
[349] "Moehringia_lateriflora"        "Stellaria_media"              
[351] "Silene_sp."                    "Silene_antirrhina"            
[353] "Stellaria_longifolia"          "Portulaca_oleracea"           
[355] "Petalostemon_purpureus"        "Mollugo_verticillata"         
[357] "Mirabilis_hirsuta"             "Mirabilis_albida"             
[359] "Mirabilis_nyctaginea"          "Cornus_alternifolia"          
[361] "Cornus_sericea"                "Cornus_sp."                   
[363] "Cornus_racemosa"               "Elymus_violaceus"             
[365] "Lysimachia_ciliata"            "Lysimachia_thyrsiflora"       
[367] "Phlox_sp."                     "Phlox_pilosa"                 
[369] "Phlox_paniculata"              "Impatiens_capensis"           
[371] "Pyrola_sp."                    "Pyrola_rotundifolia"          
[373] "Chamaedaphne_calyculata"       "Vaccinium_sp."                
[375] "Vaccinium_angustifolium"       "Vaccinium_myrtilloides"       
[377] "Ilex_sp."                      "Ilex_verticillata"            
[379] "Sporobolus_sp."                "Rhamnus_sp."                  
[381] "Rhamnus_cathartica"            "Polygonum_sp."                
[383] "Polygonum_tenue"               "Polygonella_articulata"       
[385] "Fallopia_cilinodis"            "Fallopia_convolvulus"         
[387] "Fallopia_scandens"             "Rumex_sp."                    
[389] "Rumex_britannica"              "Rumex_mexicanus"              
[391] "Rumex_acetosella"              "Persicaria_thunbergii"        
[393] "Persicaria_sagittata"          "Persicaria_amphibia"          
[395] "Persicaria_lapathifolia"       "Persicaria_pensylvanica"      
[397] "Persicaria_maculosa"           "Persicaria_hydropiper"        
[399] "Convolvulus_arvensis"          "Solanum_carolinense"          
[401] "Solanum_dulcamara"             "Solanum_americanum"           
[403] "Solanum_nigrum"                "Physalis_virginiana"          
[405] "Physalis_sp."                  "Physalis_heterophylla"        
[407] "Physalis_pubescens"            "Gentiana_andrewsii"           
[409] "Houstonia_longifolia"          "Galium_boreale"               
[411] "Galium_asperellum"             "Galium_obtusum"               
[413] "Galium_triflorum"              "Galium_trifidum"              
[415] "Galium_trifolium"              "Galium_concinnum"             
[417] "Galium_sp."                    "Galium_aparine"               
[419] "Apocynum_androsaemifolium"     "Apocynum_cannabinum"          
[421] "Gymnema_sp."                   "Asclepias_sp."                
[423] "Asclepias_verticillata"        "Asclepias_incarnata"          
[425] "Asclepias_viridiflora"         "Asclepias_ovalifolia"         
[427] "Asclepias_syriaca"             "Asclepias_exaltata"           
[429] "Hackelia_virginiana"           "Hackelia_deflexa"             
[431] "Lithospermum_carolinense"      "Lithospermum_sp."             
[433] "Lithospermum_canescens"        "Lithospermum_caroliniense"    
[435] "Fraxinus_nigra"                "Fraxinus_americana"           
[437] "Fraxinus_pennsylvanica"        "Penstemon_digitalis"          
[439] "Penstemon_gracilis"            "Penstemon_sp."                
[441] "Penstemon_cobaea"              "Penstemon_grandiflorus"       
[443] "Linaria_vulgaris"              "Veronicastrum_virginicum"     
[445] "Plantago_major"                "Plantago_lanceolata"          
[447] "Plantago_sp."                  "Plantago_patagonica"          
[449] "Verbascum_thapsus"             "Verbena_hastata"              
[451] "Verbena_stricta"               "Verbena_bracteata"            
[453] "Seemannia_sylvatica"           "Mimulus_ringens"              
[455] "Phryma_leptostachya"           "Utricularia_sp."              
[457] "Pedicularis_sp."               "Pedicularis_canadensis"       
[459] "Scutellaria_lateriflora"       "Scutellaria_parvula"          
[461] "Scutellaria_galericulata"      "Stachys_sp."                  
[463] "Stachys_palustris"             "Stachys_tenuifolia"           
[465] "Mosla_dianthera"               "Lycopus_americanus"           
[467] "Lycopus_sp."                   "Lycopus_uniflorus"            
[469] "Prunella_vulgaris"             "Agastache_foeniculum"         
[471] "Nepeta_cataria"                "Glechoma_hederacea"           
[473] "Monarda_fistulosa"             "Pycnanthemum_virginianum"     
[475] "Hedeoma_hispida"               "Monarda_punctata"             
[477] "Aralia_nudicaulis"             "Colladonia_sp."               
[479] "Cicuta_bulbifera"              "Lichtensteinia_sp."           
[481] "Zizia_aptera"                  "Zizia_aurea"                  
[483] "Viburnum_lentago"              "Symphoricarpos_occidentalis"  
[485] "Lonicera_dioica"               "Lonicera_morrowii"            
[487] "Lonicera_tatarica"             "Campanula_rotundifolia"       
[489] "Lobelia_spicata"               "Lobelia_siphilitica"          
[491] "Tragopogon_duarius"            "Tragopogon_dubius"            
[493] "Cirsium_discolor"              "Cirsium_sp."                  
[495] "Cirsium_altissimum"            "Cirsium_arvense"              
[497] "Vernonia_fasciculata"          "Hieracium_crassescens"        
[499] "Hieracium_sp."                 "Pilosella_longipila"          
[501] "Hieracium_longipilum"          "Hieracium_umbellatum"         
[503] "Lactuca_canadensis"            "Lactuca_biennis"              
[505] "Lactuca_sp."                   "Lactuca_serriola"             
[507] "Krigia_biflora"                "Sonchus_arvensis"             
[509] "Crepis_tectorum"               "Xyris_torta"                  
[511] "Combretum_sp."                 "Nabalus_racemosus"            
[513] "Nabalus_albus"                 "Taraxacum_officinale"         
[515] "Taraxacum_campylodes"          "Taraxacum_sp."                
[517] "Petalostemum_sp."              "Coreopsis_palmata"            
[519] "Bidens_tripartita"             "Bidens_cernua"                
[521] "Bidens_coronata"               "Bidens_sp."                   
[523] "Bidens_trichosperma"           "Helenium_autumnale"           
[525] "Eupatorium_maculatum"          "Eutrochium_maculatum"         
[527] "Eupatorium_perfoliatum"        "Liatris_aspera"               
[529] "Liatris_ligulistylis"          "Liatris_pycnostachya"         
[531] "Ratibida_pinnata"              "Rudbeckia_hirta"              
[533] "Echinacea_serotina"            "Echinacea_purpurea"           
[535] "Heliopsis_helianthoides"       "Ambrosia_artemisiifolia"      
[537] "Ambrosia_psilostachya"         "Helianthus_sp."               
[539] "Helianthus_hirsutus"           "Helianthus_petiolaris"        
[541] "Helianthus_laetiflorus"        "Helianthus_giganteus"         
[543] "Packera_aurea"                 "Packera_paupercula"           
[545] "Erechtites_hieracifolia"       "Erechtites_hieraciifolius"    
[547] "Achillea_sp."                  "Artemisia_campestris"         
[549] "Artemisia_sp."                 "Artemisia_ludoviciana"        
[551] "Gnaphalium_sp."                "Gnaphalium_uliginosum"        
[553] "Gnaphalium_purpureum"          "Gamochaeta_purpurea"          
[555] "Antennaria_plantaginifolia"    "Antennaria_parlinii"          
[557] "Antennaria_sp."                "Antennaria_neglecta"          
[559] "Doellingeria_umbellata"        "Heterotheca_villosa"          
[561] "Symphyotrichum_boreale"        "Erigeron_sp."                 
[563] "Erigeron_canadensis"           "Erigeron_philadelphicus"      
[565] "Erigeron_strigosus"            "Erigeron_annuus"              
[567] "Euthamia_graminifolia"         "Symphyotrichumnovae-angliae"  
[569] "Symphyotrichum_oolentangiense" "Symphyotrichum_puniceum"      
[571] "Symphyotrichum_ericoides"      "Symphyotrichum_lanceolatum"   
[573] "Eurybia_macrophylla"           "Solidago_ptarmicoides"        
[575] "Solidago_sp."                  "Solidago_canadensis"          
[577] "Solidago_nemoralis"            "Solidago_speciosa"            
[579] "Solidago_altissima"            "Solidago_missouriensis"       
[581] "Solidago_juncea"               "Solidago_gigantea"            

Explore the results

Code
glimpse(cdrData$comm) # Community data matrx
Rows: 144
Columns: 16
$ Koeleria_micrathera     <dbl> 3.00, 0.00, 0.55, 3.50, 0.00, 0.00, 0.50, 1.50…
$ Bromus_inermis          <dbl> 2.00, 19.50, 10.50, 7.00, 14.25, 5.50, 4.00, 6…
$ Elymus_repens           <dbl> 5.00, 3.50, 2.00, 9.00, 0.80, 0.75, 0.60, 1.25…
$ Andropogon_gerardi      <dbl> 0.00, 0.02, 0.00, 0.00, 0.70, 0.00, 0.00, 0.00…
$ Poa_pratensis           <dbl> 20.500, 12.500, 20.940, 7.000, 27.915, 17.000,…
$ Bouteloua_gracilis      <dbl> 4.000, 11.475, 5.000, 11.000, 0.000, 0.000, 0.…
$ Sorghastrum_nutans      <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.50, 0.00, 0.00…
$ Schizachyrium_scoparium <dbl> 0.500, 3.000, 0.020, 0.000, 0.000, 0.000, 0.00…
$ Anemone_cylindrica      <dbl> 0.000, 0.000, 0.000, 0.000, 0.000, 0.020, 0.00…
$ Lespedeza_capitata      <dbl> 0.900, 4.500, 1.250, 2.250, 1.100, 1.250, 0.90…
$ Amorpha_canescens       <dbl> 0.000, 0.150, 0.050, 0.300, 0.020, 0.050, 0.25…
$ Dalea_villosa           <dbl> 0.00, 0.00, 0.00, 0.01, 0.00, 0.00, 0.00, 0.01…
$ Lupinus_perennis        <dbl> 3.500, 20.000, 9.000, 14.000, 2.500, 18.500, 3…
$ Asclepias_tuberosa      <dbl> 0.000, 0.550, 0.050, 1.200, 0.300, 0.000, 0.30…
$ Achillea_millefolium    <dbl> 59.750, 20.000, 49.190, 42.945, 50.000, 52.560…
$ Solidago_rigida         <dbl> 0.000, 0.000, 1.000, 0.100, 0.050, 0.250, 0.30…
Code
str(cdrData$phy) # Phylogenetic data
List of 4
 $ edge       : int [1:30, 1:2] 17 18 18 19 20 21 21 22 22 20 ...
 $ edge.length: num [1:30] 54.7 81.1 26.1 14.7 14.5 ...
 $ Nnode      : int 15
 $ tip.label  : chr [1:16] "Koeleria_micrathera" "Bromus_inermis" "Elymus_repens" "Andropogon_gerardi" ...
 - attr(*, "class")= chr "phylo"
 - attr(*, "order")= chr "cladewise"
 - attr(*, "RSS")= num 2.77e-07

Awesome, we are now ready to explore the jungle of metrics for the evaluation of trait and phylogenetic structure of communities (Pausas & Verdú, 2010). Let’s start with metrics for the taxonomic dimension.

6.1 Explore diversity metrics

Let’s calculate some metrics manually and then using the package {picante} we will calculate the same metrics but for all communities at once.

6.1.1 Phylogenetic diversity

Phylogenetic diversity is just the sum of the total branch lengths in the community. In this case we are calculating PD using all species in the phylogeny, in other words, assuming that a single community contain the same number of species as the phylogeny.

Code
sum(cdrData$phy$edge.length) #  sum of the total branch lengths in the community
[1] 1088.545

We can calculate PD for individual plots or communities, for example the plot Cenrich_Nenrich_13_1998

Code
# Select species that are only present in the plot Cenrich_Nenrich_13_1998
CeNe_13_1998 <- cdrData$comm %>% 
  filter(rownames(.) == "Cenrich_Nenrich_13_1998") %>% 
  t() %>% 
  data.frame() %>% 
  filter(Cenrich_Nenrich_13_1998 > 0)

# Drop species in the phylogeny that are not present in the plot HARV_001
CeNe_13_1998_phy <- drop.tip(cdrData$phy, 
                         setdiff(cdrData$phy$tip.label, 
                                 rownames(CeNe_13_1998)))

sum(CeNe_13_1998_phy$edge.length)
[1] 698.7762

We can confirm the result by calculating PD using the package {picante}. However, instead of calculating plot by plot {picante} will calculate PD for all of the plots.

Code
bioCON_PD <- pd(samp = cdrData$comm, 
              tree = cdrData$phy, 
              include.root = FALSE) # Faith's PD

head(bioCON_PD)
                              PD SR
Cenrich_Nenrich_13_1998 698.7762  9
Cenrich_Namb_23_1998    798.6760 11
Cenrich_Nenrich_30_1998 880.4261 12
Cenrich_Namb_33_1998    893.3589 12
Cenrich_Nenrich_43_1998 725.5825 10
Cenrich_Namb_45_1998    787.2604 10

We NOW can confirm the PD value estimated by hand is equal to the PD estimated using {picante}.

Argument include.root = TRUE

Using the argument “include.root = TRUE” in the function pd() of {picante} will return a slightly different PD value as the root of the phylogeny is considered in the calculation.

You can see both metrics SR and PD are highly correlated, that is because PD is highly sensitive to the sample size, i.e., as more species are added to the community more branch lengths are added to the phylogeny and consequently the expected relationship is high.

Code
bioCON_PD %>% 
  ggplot(aes(x = SR, y = PD)) + 
  geom_point(size = 3, color = "darkgray") + 
  geom_smooth(method = "lm", se = FALSE) +
  theme_CDR()
`geom_smooth()` using formula = 'y ~ x'

Now see correlation between the two metrics

Code
bioCON_PD %$% 
  cor.test(SR, PD, use = "complete.obs")

    Pearson's product-moment correlation

data:  SR and PD
t = 45.443, df = 142, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.9547942 0.9763828
sample estimates:
      cor 
0.9672964 

Let’s plot the association again…

Code
# plot
bioCON_PD %>% 
  ggplot(aes(x = SR, y = PD)) + 
  geom_point(size = 3, color = "darkgray") + 
  labs(x = "Species richness", y = "PD (millions of years)") + 
  theme_CDR()

6.1.2 Mean pairwise distance (MPD) and mean nearest-pairwise distance (MNTD)

Other common metrics are MPD (mean pairwise distance) and MNTD (mean nearest taxon distance). As in PD, let’s calculate MPD and MNTD manually.

Code
# MPD
dist.trMB <- cophenetic(CeNe_13_1998_phy)

dist.trMB <- dist.trMB[lower.tri(dist.trMB, diag = FALSE)]

mean(dist.trMB)
[1] 201.8358
Code
# MNTD
dist.trMB2 <- cophenetic(CeNe_13_1998_phy)
diag(dist.trMB2) <- NA
apply(dist.trMB2, 2, min, na.rm = TRUE)
    Koeleria_micrathera          Bromus_inermis           Elymus_repens 
              162.12738                51.45713                51.45713 
          Poa_pratensis      Bouteloua_gracilis Schizachyrium_scoparium 
               80.46773                77.83999                77.83999 
     Lespedeza_capitata        Lupinus_perennis    Achillea_millefolium 
              127.02760               127.02760               245.67490 
Code
mean(apply(dist.trMB2, 2, min, na.rm = TRUE))
[1] 111.2133

And now using the package {picante}

Code
# MPD
bioCON_MPD <- mpd(samp = cdrData$comm, 
                  dis = cophenetic(cdrData$phy)) 

head(bioCON_MPD)
[1] 201.8358 209.7650 220.8059 224.4190 220.8688 227.9370
Code
# MNTD
bioCON_MNTD <- mntd(samp = cdrData$comm, 
                  dis = cophenetic(cdrData$phy))
head(bioCON_MNTD)
[1] 111.21327 103.49877  95.42167  99.56265  85.62067  99.08140

Until here we just played with the data and also confirmed that we can calculate different metrics by hand. Now let’s do a more formal evaluation of the biodiversity using the sample data from the BioCON at Cedar Creek, UMN.

7 Community phylogenetic diversity metrics

The analyses of community phylogenetic started making inferences about the mechanisms structuring the local communities through the evaluation of phylogenetic arrangements in local communities (see Cavender-Bares et al. (2009) for an initial criticism). However, new methods are now available, such that more complex balance between ecological and historical processes at local and regional scales can be incorporated into the analyses (Pigot & Etienne, 2015; Pinto-Ledezma et al., 2019; Pinto‐Ledezma et al., 2020).

Let’s calculate some of the most common metrics.

Note - we will use the object cdrData to store all the results.

7.0.1 Phylogenetic diversity in a community - PD

PD or phylogenetic diversity is the sum of the total phylogenetic branch length for one or multiple samples.

Code
bioCON_CDM <- ses.pd(samp = cdrData$comm, 
                   tree = cdrData$phy, 
                   runs = 999) # this will take some time

bioCON_CDM <- bioCON_CDM %>% 
  dplyr::select(ntaxa, pd.obs, pd.obs.z)


head(bioCON_CDM)
                        ntaxa   pd.obs    pd.obs.z
Cenrich_Nenrich_13_1998     9 698.7762 -0.61117980
Cenrich_Namb_23_1998       11 798.6760 -0.80587567
Cenrich_Nenrich_30_1998    12 880.4261 -0.25335949
Cenrich_Namb_33_1998       12 893.3589 -0.04006831
Cenrich_Nenrich_43_1998    10 725.5825 -1.07697449
Cenrich_Namb_45_1998       10 787.2604 -0.07198822

7.0.2 Phylogenetic Rao’s quadratic entropy - RaoD

Rao’s quadratic entropy (Rao 1982) is a measure of diversity in ecological communities that can optionally take species differences (e.g. phylogenetic dissimilarity) into account.

Code
bioCON_CDM <- bioCON_CDM %>% 
  mutate(RaoD = raoD(cdrData$comm, 
                     force.ultrametric(cdrData$phy))$Dkk)
***************************************************************
*                          Note:                              *
*    force.ultrametric does not include a formal method to    *
*    ultrametricize a tree & should only be used to coerce    *
*   a phylogeny that fails is.ultrametric due to rounding --  *
*    not as a substitute for formal rate-smoothing methods.   *
***************************************************************
Code
head(bioCON_CDM)
                        ntaxa   pd.obs    pd.obs.z     RaoD
Cenrich_Nenrich_13_1998     9 698.7762 -0.61117980 72.83234
Cenrich_Namb_23_1998       11 798.6760 -0.80587567 92.48006
Cenrich_Nenrich_30_1998    12 880.4261 -0.25335949 82.38505
Cenrich_Namb_33_1998       12 893.3589 -0.04006831 90.44613
Cenrich_Nenrich_43_1998    10 725.5825 -1.07697449 76.02816
Cenrich_Namb_45_1998       10 787.2604 -0.07198822 79.58355

7.0.3 Mean pairwise distance separating taxa in a community - MPD

Code
# SES-MPD
bioCONsesmpd <- ses.mpd(samp = cdrData$comm, 
                      dis = cophenetic(cdrData$phy), 
                      null.model = "taxa.labels", 
                      abundance.weighted = TRUE, 
                      runs = 999)

bioCONsesmpd <- bioCONsesmpd %>% 
  dplyr::select(mpd.obs, mpd.obs.z)

bioCON_CDM <- bind_cols(bioCON_CDM, bioCONsesmpd)

7.0.4 Mean nearest taxon distance for taxa in a community - MNTD

Code
# SES-MNTD
bioCONsesmntd <- ses.mntd(samp = cdrData$comm, 
                        dis = cophenetic(cdrData$phy), 
                        null.model = "taxa.labels", 
                        abundance.weighted = TRUE,
                        runs = 999)

bioCONsesmntd <- bioCONsesmntd %>% 
  dplyr::select(mntd.obs, mntd.obs.z)

bioCON_CDM <- bind_cols(bioCON_CDM, bioCONsesmntd)

7.0.5 Phylogenetic species variability - PSV

Phylogenetic species variability quantifies how phylogenetic relatedness decreases the variance of a hypothetical unselected/neutral trait shared by all species in a community.

Code
# PSV or phylogenetic species variability
bioCON_CDM <- bioCON_CDM %>% 
  mutate(PSV = psv(samp = cdrData$comm, 
               tree = cdrData$phy, 
               compute.var = TRUE)$PSVs)

head(bioCON_CDM)
                        ntaxa   pd.obs    pd.obs.z     RaoD  mpd.obs  mpd.obs.z
Cenrich_Nenrich_13_1998     9 698.7762 -0.61117980 72.83234 145.6647 0.73128775
Cenrich_Namb_23_1998       11 798.6760 -0.80587567 92.48006 184.9601 0.09906262
Cenrich_Nenrich_30_1998    12 880.4261 -0.25335949 82.38505 164.7701 0.57432774
Cenrich_Namb_33_1998       12 893.3589 -0.04006831 90.44613 180.8923 0.79241741
Cenrich_Nenrich_43_1998    10 725.5825 -1.07697449 76.02816 152.0563 0.44127453
Cenrich_Namb_45_1998       10 787.2604 -0.07198822 79.58355 159.1671 0.80855425
                         mntd.obs mntd.obs.z       PSV
Cenrich_Nenrich_13_1998 182.39483  1.5313390 0.7433659
Cenrich_Namb_23_1998    109.41470  0.1185387 0.7725693
Cenrich_Nenrich_30_1998  61.04982 -1.0960216 0.8132330
Cenrich_Namb_33_1998     70.97279 -0.9693872 0.8265401
Cenrich_Nenrich_43_1998  57.41421 -1.2081048 0.8134647
Cenrich_Namb_45_1998     64.13443 -1.0781152 0.8394972

7.0.6 Phylogenetic species richness - PSR

Phylogenetic species richness is the number of species in a sample multiplied by PSV.

Code
# PSR or phylogenetic species richness
bioCON_CDM <- bioCON_CDM %>% 
  mutate(PSR = psr(samp = cdrData$comm, 
               tree = cdrData$phy, 
               compute.var = TRUE)$PSR)

head(bioCON_CDM)
                        ntaxa   pd.obs    pd.obs.z     RaoD  mpd.obs  mpd.obs.z
Cenrich_Nenrich_13_1998     9 698.7762 -0.61117980 72.83234 145.6647 0.73128775
Cenrich_Namb_23_1998       11 798.6760 -0.80587567 92.48006 184.9601 0.09906262
Cenrich_Nenrich_30_1998    12 880.4261 -0.25335949 82.38505 164.7701 0.57432774
Cenrich_Namb_33_1998       12 893.3589 -0.04006831 90.44613 180.8923 0.79241741
Cenrich_Nenrich_43_1998    10 725.5825 -1.07697449 76.02816 152.0563 0.44127453
Cenrich_Namb_45_1998       10 787.2604 -0.07198822 79.58355 159.1671 0.80855425
                         mntd.obs mntd.obs.z       PSV      PSR
Cenrich_Nenrich_13_1998 182.39483  1.5313390 0.7433659 6.690293
Cenrich_Namb_23_1998    109.41470  0.1185387 0.7725693 8.498262
Cenrich_Nenrich_30_1998  61.04982 -1.0960216 0.8132330 9.758796
Cenrich_Namb_33_1998     70.97279 -0.9693872 0.8265401 9.918481
Cenrich_Nenrich_43_1998  57.41421 -1.2081048 0.8134647 8.134647
Cenrich_Namb_45_1998     64.13443 -1.0781152 0.8394972 8.394972

7.0.7 Phylogenetic species evenness - PSE

Phylogenetic species evenness is the metric PSV modified to incorporate relative species abundances.

Code
# PSE or phylogenetic species evenness
bioCON_CDM <- bioCON_CDM %>% 
  mutate(PSE = pse(samp = cdrData$comm, 
               tree = cdrData$phy)$PSEs)

head(bioCON_CDM)
                        ntaxa   pd.obs    pd.obs.z     RaoD  mpd.obs  mpd.obs.z
Cenrich_Nenrich_13_1998     9 698.7762 -0.61117980 72.83234 145.6647 0.73128775
Cenrich_Namb_23_1998       11 798.6760 -0.80587567 92.48006 184.9601 0.09906262
Cenrich_Nenrich_30_1998    12 880.4261 -0.25335949 82.38505 164.7701 0.57432774
Cenrich_Namb_33_1998       12 893.3589 -0.04006831 90.44613 180.8923 0.79241741
Cenrich_Nenrich_43_1998    10 725.5825 -1.07697449 76.02816 152.0563 0.44127453
Cenrich_Namb_45_1998       10 787.2604 -0.07198822 79.58355 159.1671 0.80855425
                         mntd.obs mntd.obs.z       PSV      PSR       PSE
Cenrich_Nenrich_13_1998 182.39483  1.5313390 0.7433659 6.690293 0.6035471
Cenrich_Namb_23_1998    109.41470  0.1185387 0.7725693 8.498262 0.7493335
Cenrich_Nenrich_30_1998  61.04982 -1.0960216 0.8132330 9.758796 0.6620203
Cenrich_Namb_33_1998     70.97279 -0.9693872 0.8265401 9.918481 0.7267966
Cenrich_Nenrich_43_1998  57.41421 -1.2081048 0.8134647 8.134647 0.6222520
Cenrich_Namb_45_1998     64.13443 -1.0781152 0.8394972 8.394972 0.6513511

We have calculated several metrics that describe the phylogenetic structure of communities using data from the bioCON experiment.

8 Compare the metrics

Code
library(car)
Loading required package: carData

Attaching package: 'car'
The following object is masked from 'package:dplyr':

    recode
The following object is masked from 'package:purrr':

    some
Code
scatterplotMatrix(bioCON_CDM)

Hmmm, Kinda ugly…

Let’s use the package {corrplot} to get a more informative figure.

Code
library(corrplot)
corrplot 0.95 loaded
Code
cor_mat <- cor(bioCON_CDM, method = 'spearman')

corrplot.mixed(cor_mat, 
               tl.pos = "lt", 
               tl.cex = 0.7, 
               number.cex = 0.7, 
               addCoefasPercent = TRUE, 
               mar = c(0, 0, 1, 0))

Explore the correlation among metrics.

Code
bioCON_CDM %>% 
  cor.table()
$r
                ntaxa     pd.obs   pd.obs.z      RaoD   mpd.obs mpd.obs.z
ntaxa       1.0000000  0.9727041 0.51119419 0.7108198 0.7108198 0.5991219
pd.obs      0.9727041  1.0000000 0.69066138 0.7092764 0.7092764 0.6436876
pd.obs.z    0.5111942  0.6906614 1.00000000 0.3586867 0.3586867 0.4857613
RaoD        0.7108198  0.7092764 0.35868673 1.0000000 1.0000000 0.7870801
mpd.obs     0.7108198  0.7092764 0.35868673 1.0000000 1.0000000 0.7870801
mpd.obs.z   0.5991219  0.6436876 0.48576131 0.7870801 0.7870801 1.0000000
mntd.obs   -0.1540926 -0.1051227 0.03292653 0.2662593 0.2662593 0.2695500
mntd.obs.z  0.3391822  0.3685552 0.25569592 0.4342140 0.4342140 0.4491911
PSV         0.4001036  0.5578076 0.80079668 0.4154596 0.4154596 0.6163614
PSR         0.9853622  0.9935168 0.63260476 0.7148107 0.7148107 0.6512118
PSE         0.5993631  0.6026233 0.30473298 0.9876247 0.9876247 0.7731583
              mntd.obs mntd.obs.z        PSV        PSR       PSE
ntaxa      -0.15409256  0.3391822 0.40010361  0.9853622 0.5993631
pd.obs     -0.10512274  0.3685552 0.55780756  0.9935168 0.6026233
pd.obs.z    0.03292653  0.2556959 0.80079668  0.6326048 0.3047330
RaoD        0.26625928  0.4342140 0.41545957  0.7148107 0.9876247
mpd.obs     0.26625928  0.4342140 0.41545957  0.7148107 0.9876247
mpd.obs.z   0.26955005  0.4491911 0.61636142  0.6512118 0.7731583
mntd.obs    1.00000000  0.7974914 0.08030072 -0.1440892 0.3396013
mntd.obs.z  0.79749135  1.0000000 0.21618061  0.3294928 0.4185318
PSV         0.08030072  0.2161806 1.00000000  0.5391118 0.3872296
PSR        -0.14408917  0.3294928 0.53911184  1.0000000 0.6075197
PSE         0.33960134  0.4185318 0.38722963  0.6075197 1.0000000

$df
[1] 142

$P
                   ntaxa        pd.obs     pd.obs.z          RaoD       mpd.obs
ntaxa       0.000000e+00  5.612958e-92 5.871085e-11  1.865695e-23  1.865695e-23
pd.obs      5.612958e-92  0.000000e+00 9.893824e-22  2.558910e-23  2.558910e-23
pd.obs.z    5.871085e-11  9.893824e-22 0.000000e+00  1.012285e-05  1.012285e-05
RaoD        1.865695e-23  2.558910e-23 1.012285e-05  0.000000e+00  0.000000e+00
mpd.obs     1.865695e-23  2.558910e-23 1.012285e-05  0.000000e+00  0.000000e+00
mpd.obs.z   2.146618e-15  3.276090e-18 6.767242e-10  1.357556e-31  1.357556e-31
mntd.obs    6.517957e-02  2.098580e-01 6.952200e-01  1.256656e-03  1.256656e-03
mntd.obs.z  3.200990e-05  5.489325e-06 1.980125e-03  5.408579e-08  5.408579e-08
PSV         6.745883e-07  3.793055e-13 2.031824e-33  2.241959e-07  2.241959e-07
PSR        5.330054e-111 5.469850e-136 1.813527e-17  8.162162e-24  8.162162e-24
PSE         2.078146e-15  1.337369e-15 2.042910e-04 3.832272e-116 3.832272e-116
              mpd.obs.z     mntd.obs   mntd.obs.z          PSV           PSR
ntaxa      2.146618e-15 6.517957e-02 3.200990e-05 6.745883e-07 5.330054e-111
pd.obs     3.276090e-18 2.098580e-01 5.489325e-06 3.793055e-13 5.469850e-136
pd.obs.z   6.767242e-10 6.952200e-01 1.980125e-03 2.031824e-33  1.813527e-17
RaoD       1.357556e-31 1.256656e-03 5.408579e-08 2.241959e-07  8.162162e-24
mpd.obs    1.357556e-31 1.256656e-03 5.408579e-08 2.241959e-07  8.162162e-24
mpd.obs.z  0.000000e+00 1.086558e-03 1.630704e-08 1.972952e-16  9.844413e-19
mntd.obs   1.086558e-03 0.000000e+00 5.760136e-33 3.386902e-01  8.488789e-02
mntd.obs.z 1.630704e-08 5.760136e-33 0.000000e+00 9.257206e-03  5.515046e-05
PSV        1.972952e-16 3.386902e-01 9.257206e-03 0.000000e+00  3.145886e-12
PSR        9.844413e-19 8.488789e-02 5.515046e-05 3.145886e-12  0.000000e+00
PSE        7.114652e-30 3.125256e-05 1.786550e-07 1.629468e-06  6.833162e-16
                     PSE
ntaxa       2.078146e-15
pd.obs      1.337369e-15
pd.obs.z    2.042910e-04
RaoD       3.832272e-116
mpd.obs    3.832272e-116
mpd.obs.z   7.114652e-30
mntd.obs    3.125256e-05
mntd.obs.z  1.786550e-07
PSV         1.629468e-06
PSR         6.833162e-16
PSE         0.000000e+00

You can also plot the relationship between metrics.

Code
bioCON_CDM %>% 
  ggplot(aes(x = mpd.obs.z, y = PSV)) + 
  geom_point(size = 3, color = "darkgray") + 
  geom_smooth(method = lm, se = FALSE) +
  labs(x = "MPDz", y = "PSV") + 
  theme_CDR()
`geom_smooth()` using formula = 'y ~ x'

Code
bioCON_mds <- bioCON_CDM %>% 
  dplyr::select(-c(pd.obs.z, mpd.obs.z, mntd.obs.z)) %>% # MNDS does not like negative values
  na.omit() %>% 
  metaMDS(.)
Square root transformation
Wisconsin double standardization
Run 0 stress 0.05510432 
Run 1 stress 0.05510431 
... New best solution
... Procrustes: rmse 1.144575e-06  max resid 9.099167e-06 
... Similar to previous best
Run 2 stress 0.0551043 
... New best solution
... Procrustes: rmse 1.14821e-05  max resid 0.0001263023 
... Similar to previous best
Run 3 stress 0.05510432 
... Procrustes: rmse 1.301557e-05  max resid 0.000138871 
... Similar to previous best
Run 4 stress 0.05510432 
... Procrustes: rmse 1.97604e-05  max resid 0.000227338 
... Similar to previous best
Run 5 stress 0.05510431 
... Procrustes: rmse 8.803254e-06  max resid 9.55986e-05 
... Similar to previous best
Run 6 stress 0.0551043 
... Procrustes: rmse 4.731091e-06  max resid 4.890036e-05 
... Similar to previous best
Run 7 stress 0.05510431 
... Procrustes: rmse 9.006077e-06  max resid 9.736344e-05 
... Similar to previous best
Run 8 stress 0.05510431 
... Procrustes: rmse 5.987339e-06  max resid 4.620612e-05 
... Similar to previous best
Run 9 stress 0.0551043 
... Procrustes: rmse 5.213106e-06  max resid 4.922695e-05 
... Similar to previous best
Run 10 stress 0.05510434 
... Procrustes: rmse 2.426115e-05  max resid 0.0002790546 
... Similar to previous best
Run 11 stress 0.05510431 
... Procrustes: rmse 9.06183e-06  max resid 0.000101455 
... Similar to previous best
Run 12 stress 0.05510432 
... Procrustes: rmse 1.198856e-05  max resid 0.0001320375 
... Similar to previous best
Run 13 stress 0.05510431 
... Procrustes: rmse 8.348821e-06  max resid 9.106455e-05 
... Similar to previous best
Run 14 stress 0.2306643 
Run 15 stress 0.05510431 
... Procrustes: rmse 1.003299e-05  max resid 0.0001090415 
... Similar to previous best
Run 16 stress 0.05510431 
... Procrustes: rmse 7.258748e-06  max resid 7.71984e-05 
... Similar to previous best
Run 17 stress 0.05510435 
... Procrustes: rmse 2.313774e-05  max resid 0.0002640726 
... Similar to previous best
Run 18 stress 0.05510431 
... Procrustes: rmse 1.166813e-05  max resid 0.0001331272 
... Similar to previous best
Run 19 stress 0.0551043 
... Procrustes: rmse 6.596328e-06  max resid 7.368426e-05 
... Similar to previous best
Run 20 stress 0.1516128 
*** Best solution repeated 17 times
Code
ordiplot(bioCON_mds, type = "t", display = "species")

Let’s use {ggplot2} to make a better figure.

Code
## Metrics scores
bioCON_mds_scores <- data.frame(bioCON_mds$species) %>% 
  mutate(Metric = rownames(.))

bioCON_mds_scores %>% 
  ggplot(aes(x = MDS1, y = MDS2, color = Metric)) + 
  geom_point(size = 3, alpha = 0.5) + 
  theme_CDR()

What do you think?

Which metric would you use for your research?

The end, for now!

References

Cavender-Bares, J., Kozak, K. H., Fine, P. V. A., & Kembel, S. W. (2009). The merging of community ecology and phylogenetic biology. Ecology Letters, 12(7), 693–715. https://doi.org/10.1111/j.1461-0248.2009.01314.x
Pausas, J. G., & Verdú, M. (2010). The jungle of methods for evaluating phenotypic and phylogenetic structure of communities. BioScience, 60(8), 614–625. https://doi.org/10.1525/bio.2010.60.8.7
Pigot, A. L., & Etienne, R. S. (2015). A new dynamic null model for phylogenetic community structure. Ecology Letters, 18(2), 153–163. https://doi.org/10.1111/ele.12395
Pinto-Ledezma, J. N., Jahn, A. E., Cueto, V. R., Diniz-Filho, J. A. F., & Villalobos, F. (2019). Drivers of phylogenetic assemblage structure of the furnariides, a widespread clade of lowland neotropical birds. The American Naturalist, 193(2), E41–E56. https://doi.org/10.1086/700696
Pinto‐Ledezma, J. N., Villalobos, F., Reich, P. B., Catford, J. A., Larkin, D. J., & Cavender‐Bares, J. (2020). Testing darwin’s naturalization conundrum based on taxonomic, phylogenetic, and functional dimensions of vascular plants. Ecological Monographs, 90(4). https://doi.org/10.1002/ecm.1420
Webb, C. O., Ackerly, D. D., McPeek, M. A., & Donoghue, M. J. (2002). Phylogenies and community ecology. Annual Review of Ecology and Systematics, 33(1), 475–505. https://doi.org/10.1146/annurev.ecolsys.33.010802.150448